TwoBundes 
Total Fetch 
Bandwidth 



Ported 
hstructkn 
Cachal 





Banked 
Data Cache) 



Private, Per-Thread 

8 Bundle 
Expansion Queues 



Private, Per-Thread 
Register Files 



tnt and FP 
Functional Units 

TwoBuxtes 
Total Bandwidth 



Figure 2 
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Dynamically invoke 


a speculative thread 




V 






Execute instructions in speculative thread 





Figure 3 
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Allocate a hardware thread context for a 
speculative thread 



il 

Copy live-in values to the hardware thread 
context register files 



M/ 

Provide the hardware thread context with 
the address of the first instruction of the 
speculative thread 




Figure 4 
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Identify set of delinquent loads 



Construct pre-computation slices for the 
set of delinquent loads 



Establish triggers 



4of 



^^5 



arc = arcs + group__pos; 

for{ ; arc < stop^arcs; arc += nr_group ) 



if ( arc ->ident > BASIC ) 

{ 

red_cost = arc->cost - arc-> 
arc-> head->potential ; 



tail->potential + 



if((red_cost < 0 && arc->ident == AT_LOWER) || 
{red_cost: > 0 arc->idenc == AT__UPPER) ) 

basket_si2e++ ; 
permfbasket_sizej ->a - arc- 
perm [basket_size] ->cost = red_cost; 
perm[basket_size] ->abs_cost = ABS (red_cost ) ; 



Delinquent Load 56;i 
Delinquent Load 5*0^ 



Delinquent Load 



504 



} 

} 


L1 Mss Rate / 
% Capacity Miss 


L2 Miss Rate / 
% Capacity Miss 


13 Miss Rate / 
% Capacity Miss 


DelinquentLoad 5b X 


99.95% / 99.98% 


48.06% / 82.78% 


67.64% / 97.36% 


DeiinquentLoad f 


80,92% / 97.60% 


63.55% / 86.51% 


20.04% / 47.88% 


Delinquent Load q 


93.10%/ 99.1% 


45.33% / 74.65% 


20.70% 7 44.74% 



6"/ 



404900: 
404901: 
404902: 

404910: 
404911: 
404912: 

404920 : 
404921: 
404922: 

404930: 
404931: 
404932: 

404940: 
404941 : 
404942: 

404950: 
404951: 
404952 : 

404960: 
404961: 
404962: 

404990: 
404991: 
404992: 



404A62: 



Loop Carried Dependence 



G 



loop top: 
~ add 
add 
add 




rl4==:rl4,rll 
»>r5=r5,ril 

r8=r8,rll ir 


add 
ld4 
add 




r40=r40,rll 

rl7=:trl43 

r3=r3,rll 


IdS.s 
add 

cmp. Itu.unc 


r2= [r40] 
r26=8,r20 
pl5,pl4=r3,r30 


ld4 .s 
add 




rl6^[r9] <r 
r25=[r8] 
r28=l,r50 


add 

cmp4 . le 
cmp4 . ne 


unc 
unc 


r24=:80,r2 
pl3,pl2=rl7,r0 
pl4,p6==l,rl7 


IdS.s 


rl9 


=[r26] 



(pl3] 



nop . f 0 
br . cond . dpnt 

add 
chk. s 
nop. I 

ld4.s 
ld4 
chk. s 



loop_top 
rl6, .b6 164 



r23=t[r24] 
r21^[rl5] 
r2, .b6 166 



X g^^ip Trigger 



Basic p-slice 



add r9=r9/rll 

IdB.s rl6=[r93 

add rl5=80,rl6 

ld4 r21=[rl5] 



Delinquent Load! 



br . cond . sptk loop_top 




Figure 8 
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Two instances of the same pre-computation slice 
or two different pre-computation slices spawned 
within a predetermined window 



Generate a new pre-computation slice having a 
chaining trigger 



Figure 9 
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Add instructions from one of the instances of the 
same pre-computation slice that modify values 
used in the next pre-computation slice to the 
prologue of the new pre-computation slice 



"fox 



Add instructions from one of the instances of the 
same pre-computation slice that are used to 
produce the address loaded by the delinquent 
load to the epilogue of the new pre-computation 
slice 



U 

Insert a "spawn" instruction between the 
prologue and the epilogue of the new pre- ^^ojf 
computation slice 



1 l. ^p" 



Figure 10 
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S/ 



All hardware thread contexts occupied 



Check for entries in queue for spawned pre- 
computation slice 



/oof 



1 



No entries in queue 
for spawned pre- 
computation slice 



/ofU 



At least one entries in 

queue for spawned 
pre-computation slice 



Ignore request 



Pre-computation slice 
occupies entry in 
queue until the 
spawned pre- 
computation slice is 
assigned to a hardware 
context 



